Cloning Voronoi Diagrams via 
Retroactive Data Structures 



Matthew T. Dickerson 
Dept. of Math and Computer Sci. 
Middlebury College 



David Epp stein 
Dept. of Computer Science 
Univ. of California, Irvine 



Michael T. Goodrich 
Dept. of Computer Science 
Univ. of California, Irvine 



Abstract 



We address the problem of replicating a Voronoi diagram V(S) of a planar point set S by making 
proximity queries, which are of three possible (in decreasing order of information content): 

1 . the exact location of the nearest site(s) in S 

2. the distance to and label(s) of the nearest site(s) in S 

3. a unique label for every nearest site in S. 

We provide algorithms showing how queries of Type 1 and Type 2 allow an exact cloning of V (S) with 
0(n) queries and (9(«log 2 «) processing time. We also prove that queries of Type 3 can never exactly 
clone V(S), but we show that with O(nlogi) queries we can construct an e-approximate cloning of 
V (S). In addition to showing the limits of nearest-neighbor database security, our methods also provide 
one of the first natural algorithmic applications of retroactive data structures, as several of our methods 
critically rely on the use of such structures, one of which is new to this paper. 



1 Introduction 



Data protection is an important and growing concern in the information age. Many Internet services derive 
income from online queries of their databases; hence, they have an economic interest in keeping others from 
replicating their data and offering a competing service. That is, from a technological point of view, an online 
service's economic foundation depends on preventing others from cloning its data. Unfortunately, each time 
such a service answers a query, it leaks a little piece of its database. Thus, the information security of such a 
service can be characterized in terms of the existence of efficient algorithms that can exploit the data leakage 
present in each response to systematically extract information from that service to be able to replicate that 
service. 

In the algorithmic data-cloning framework ll24l . a data querier, Bob, is allowed certain types of queries 
to a data set S that belongs to a data owner, Alice. Once Alice has determined the kinds of queries that she 
will allow, she must correctly answer every valid query from Bob. Bob has no other way to access S except 
through these queries. The information security question, then, is to determine how many queries and how 
much processing time is needed for Bob to clone the entire data set. We define a. full cloning of S to mean 
that Bob can replicate Alice's API so that he can answer any validly-formed query as accurately as Alice 
could. In an e-approximate cloning of S, Bob can answer any validly-formed query to within an accuracy of 
£ > 0. Thus, from the perspective of algorithmic research, finding efficient data-cloning algorithms for Bob 
demonstrates increased information-security risks for Alice, and showing algorithmic lower bounds for Bob 
demonstrates improved security for Alice. 

In this paper, we are interested in data sets consisting of a set S of n points in the plane, where n and the 
contents of S are initially unknown. We study the risks to S when Alice supports planar nearest-neighbor 
queries on S. For example, Alice could represent the web site for a large coffee restaurant franchise, which 
provides the location of its nearest coffee restaurant given the GPS coordinates of a querier. We assume that 
all the sites in S are inside a known bounding box, B, which, without loss of generality, can be assumed to 
be a square with sides normalized to have length 1 . Since planar nearest-neighbor queries define a Voronoi 
diagram in the plane (e.g., see |[T5l "). we can view Bob's goal in this instance of the algorithmic data-cloning 
framework as that of trying to determine the Voronoi diagram of S inside the bounding box B. If Alice's 
API allows Bob to exactly determine the Voronoi diagram of S, then we assume Bob is interested in a full 
cloning of S. Otherwise, if an exact determination of the set S or a representation of the Voronoi diagram is 
impossible to determine, then Bob is interested in an £-approximate cloning of S, which, in this case, would 
be a determination of the Voronoi diagram of S to within a Euclidean distance of e > 0. Naturally, Bob 
wishes to clone S minimizing both the number of queries made and the running time to process the results 
of those queries. 

The difficulty of cloning Alice's database depends in part on the information she provides in response to 
a nearest-neighbor query point p. We consider three types of responses (in decreasing order of information 
content): 

1 . the exact location of the nearest site(s) to p in S 

2. the distance and label(s) of the nearest site(s) to p in S 

3. a unique label identifying each nearest site to p in S. 

Implicitly, all three query types request information about a site in S whose Voronoi region the query point 
p is contained within. Query 1, for example, asks for the exactly location and label of site q G S whose 
Voronoi region V(q) contains p, or if jo's nearest neighbor is not unique for the Voronoi edge or Voronoi 
vertex on which p falls. With all three cases, we want to know how difficult it is to compute the Voronoi 
diagram, or an approximation of it, from a set of queries. 
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1.1 Related Work 



Let us briefly review some related prior work. 

Bancilhon and Spyratos [7], Deutsch and Papakonstantinou ifTTl . and Miklau and Suciu 1131! provide 
general models for privacy loss in information releases from a database, called query-view security, which 
identifies the sensitive information as a specific secret, S. Attackers are allowed to form legal queries and 
ask them of the database, while the database owner tries to protect the information that these queries leak 
about the secret S. Note that we are instead considering a related, but different, quantitative measure, where 
there is no specifically sensitive part of the data, but the data owner, Alice, is trying to limit releasing too 
much of her data. 

There has been considerable recent work on designing technological approaches that can help protect 
the privacy or intellectual property rights of a database by modifying its content. For example, several 
researchers (e.g., see 0|2l|3]|4l[25]|37l|39l|38]]) have focused on data watermarking, which is a technique 
for altering the data to make it easier, after the fact, to track when someone has stolen information and 
published their own database from it. Alternatively, several other researchers ETl |34l |35l |30l [T] [12j @4j [8l 
have proposed generalization as a way of specifying a quantifiable secrecy-preservation requirement for 
databases. The generalization approach is to group attribute values into equivalence classes, and replace 
each individual attribute value with its class name, thereby limiting the information that can be derived from 
any selection query. Our assumption in this paper, however, is that the data owner, Alice, is not interested in 
modifying her data, since she derives an economic interest from its accuracy. She may be interested instead 
in placing a reasonable limit on the number of queries any one user might ask, so that she can limit her 
exposure to the risk of that user cloning her data. 

There has been less prior related work that takes the "black hat" perspective of this paper, which asks 
the question of how quickly a data set can be discovered from seemingly minimalistic responses to queries 
on that object. For example, Goodrich |[24l studies the problem of discovering a DNA string from genomic 
comparison queries and Nohl and Evans [ 32] study quantitative measures on the global system information 
that is leaked from learning the contents of multiple RFID tags that are generated by that system. 

Motivated by the problem of having a robot discover the shape of an object by touching it Ifl4l . there is 
considerable amount of related work in the computational geometry literature on discovering polygonal and 
polyhedral shapes from probing (e.g., see |[5j |6l |9j [TT] [T9j [2T1 [26j [33l |42l |41]]). Rather than review all of this 
work in detail, we refer the interested reader to the survey and book chapter by Skiena Il40ll43l . and simply 
mention that, with the notable exception of work by Dobkin et al. lfT9l , this prior work is primarily directed 
at discovering obstacles in a two-dimensional environment using various kinds of contact probes. Thus, 
although it is related, most of this prior work cannot be adapted to the problem of discovering a Voronoi 
diagram through nearest-neighbor probes. 

By a well-known lifting method (e.g., see lfT5l ). a 2-dimensional Voronoi diagram can be defined by a 
projection to the plane of the skeleton of a 3-dimensional convex polyhedron, which is defined by the upper 
envelope of planes tangent at point sites mapped vertically to a certain paraboloid. This property of Voronoi 
diagrams implies that the method of Dobkin et al. [19] for discovering a convex polytope via ray-shooting 
"finger" probes can be used to discover a Voronoi diagram in the plane using nearest-neighbor queries (for 
instance, the queries we call "exact queries"). Translated into this context, their method results in a scheme 
that would use In — 5 queries to clone a Voronoi diagram, with a time overhead that is 0(n ). 

In the framework of retroactive data structures |[T0l[T6ll23l . which are themselves related to persistent 
data structures |[T8l 12011361 . each update operation o to a data structure D, such as an insertion or deletion 
of an element, comes with a unique numerical value, t , specifying a time value at which the operation o is 
assumed to take place. The order in which operations are presented to the data structure is not assumed to be 
the same as the order of these time values. Just like update operations, query operations also come with time 
values; a query with time value t should return a correct response with respect to a data structure on which all 
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operations with t < t have been performed. Thus, an update operation, o, having a time value, t , will affect 
any subsequent queries having time values greater than or equal to t . In a partially retroactive data structure 
the time for a query must be at least as large as the maximum t seen so far, whereas in a. fully retroactive 
data structures there is no restriction on the time values for queries. Persistent data structures lfl~8l l20l 1361 
also allow for such "queries in the past," as with fully retroactive data structures, but in a persistent structure, 
D, all updates must either be for the current version of D or they must "fork" off an alternate version of D 
from a past instance of D. 

Demaine et al. [16] show how a general comparison-based ordered dictionary (with successor and pre- 
decessor queries) of n elements (which may not belong to a total order, but which can always be compared 
when they are in D for the same time value) can be made fully retroactive in 0{n\ogn) space and 0(log 2 n) 
query time and amortized 0(log 2 7i) update time in the pointer machine model. Blelloch [ 10] and Giora and 
Kaplan ll23ll improve these bounds, for numerical (totally ordered) items, showing how to achieve a fully 
retroactive ordered dictionary in 0(n) space and 0(\ogn) query and update times in the RAM model. These 
latter results do not apply to the general comparison-based partially-ordered setting, however. 

1.2 Our Results 

Given a set S of n points in the plane, with an API that supports nearest-neighbor queries, we show how 
queries of Type 1 and Type 2 allow an exact cloning of the Voronoi diagram, V(S), of S with 0(n) queries 
and 0(nlog 2 n) processing time. Our algorithms are based on non-trivial modifications of the sweep-line 
algorithm of Fortune ll22l (see also lfT5l ) so that it can construct a Voronoi diagram correctly in 0(nlog 2 n) 
time while tolerating unbounded amounts of backtracking. We efficiently accommodate this unpredictable 
backtracking through the use of a fully retroactive data structure for general comparison-based dictionaries. 
In particular, our method is based on our showing that the dynamic point location method of Cheng and 
Janardan [13] can be adapted into a method for achieving a general comparison-based fully retroactive 
ordered dictionary with 0(n) space, Oilogn) amortized update times, and 0(log 2 7i) query times. We also 
provide lower bounds that show that, even with an adaptation of the Dobkin et al. lfl9ll approach optimized 
for nearest-neighbor searches, there is a sequence of query responses that requires Q.{n 2 ) overhead for their 
approach applied to these types of exact queries. Nevertheless, we show that it is possible to clone V (S) using 
only 3n queries. We prove that queries of Type 3 can never exactly clone V(S), however, nor even determine 
with certainty the value of n = \S\. Nevertheless, we show that with nlog(^) queries we can construct an e- 
approximate cloning of V(S) that will support approximate nearest neighbor queries guaranteeing a response 
that is a site within (additive) e > distance of the exact nearest neighbor of the query point. Finally, we 
show that an approach based on using a quadtree to approximate the Voronoi diagram, which is a natural 
alternative approach, in this case results in a suboptimal approximation. 

2 A Fully-Retroactive General Comparison-Based Ordered Dictionary 

In this section, we develop a fully retroactive ordered dictionary data structure using 0(n) space, 0(log«) 
amortized update time, and 0(log 2 7i) query time, based on a dynamic point location method of Cheng and 
Janardan [ 13 ]. Our data structure only performs comparisons between pairs of items that are simultaneously 
active at some point in the history of the retroactive data structure. The main idea is to construct an interval 
tree, B, over the intervals between the insertion and deletion times of each item in the dictionary, and to 
maintain B as a BB[a]-tree 11291 . 

Each item x is stored at the unique node v in B such that jc's insertion time is associated with v's left 
subtree and x's deletion time is associated with v's right subtree. We store x in two priority search trees [28 ], 
L(v) and R(v), associated with node v. These two priority search trees are both ordered by the dictionary 
ordering of the items stored in them; all such items are active at the time value that separates v's left and 
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right subtrees, so they are all comparable to each other. The priority search trees differ, however, in how 
they prioritize their items. As with priority search trees more generally, each node in L(v) and R(y) stores 
two items, one that is used as a search key and another that has the minimum or maximum priority within 
its subtree. In L(v), the insertion time of an item is used as a priority, and a node in L(v) stores the item that 
has the minimum insertion time among all items within the subtree of descendants of that node. In R(v), 
the deletion time of an item is used as a priority, and a node in R(y) stores the item that has the maximum 
deletion time within its subtree. 

An insertion of an item x in D is done by finding the appropriate node v of the interval tree and inserting 
x into L(v) and R(v), and a deletion is likewise done by deletions in L(v) and R(v). Updates that cause a 
major imbalance in the interval tree structure are processed by rebalancing, which implies, by the properties 
of BB[a]-trees [29], that updates run in 0(logn) amortized time. 

Queries are done by searching the interval tree for the nodes with the property that the retroactive time 
specified as part of the query could be contained within one of the time intervals associated with that node. 
For each matching interval tree node v, we perform a search in either L(v) or R(y) depending on the relation 
between the query time and the time that separates the left and right children of v. The search method of 
Cheng and Janardan lfT3l allows us to find the successor of the query value, among the nodes stored in L{v) 
or R(y) with time intervals that contain the query time, in time 0(log7i). The result of the overall query 
is then formulated by comparing the results found at each interval tree node and choosing the one that is 
closest to the query value. Thus, the query takes 0(log«) time to identify the interval tree nodes associated 
with the query time, 0(log 2 7i) to query each of logarithmically many priority search trees, and 0(log«) time 
to combine the results, for a total of 0(log 2 «) time. 

Theorem 1: One can maintain a fully-retroactive general comparison-based dictionary on n elements, using 
0(n) space, so that updates run in 0(logn) amortized time and predecessor and successor queries run in 
0(log 2 n) time. 

3 Exact Query Probes 

We begin our study of Voronoi diagram cloning with the strongest sort of queries — Type 1. Given a query 
point p, a Type-1 query returns the site q in S nearest to p, that is, it returns the geometric location of q, p's 
nearest-neighbor in S. In the event that p has more than one nearest neighbors in 5, all nearest neighbors 
are returned. We show that only 0(n) queries and 0(n\og 2 n) processing time is needed to completely clone 
V(S) — which, as implied, also means we explicitly have determined both S and n. 

In this section, we present an algorithm that accomplishes this task and we argue that the algorithm is 
correct and requires only 0(nlog 2 n) time. We show that the number of probes needed is less than An. 

Overview of Our Algorithm. Our algorithm is adapted from the plane sweep Voronoi diagram algorithm 
of Fortune ll22l . with a significant modification to allow for unbounded and unpredictable amounts of back- 
tracking. The fundamental difference is that the Fortune algorithm begins with the set of sites, S, completely 
known; in our case, the only thing we know at the start is a bounding box containing S. Using the formula- 
tion of de Berg et al. 031. Fortune's algorithm uses an event queue to controls a sweep line that moves in 
order of decreasing y coordinates, with a so-called "beach line" — an x-monotone curve made up of parabolic 
segments following above the sweep line. The plane above the beach line is partitioned into cells according 
to the final Voronoi diagram of S. There are two types of events, caused when the sweep line crosses point 
sites in S and Voronoi vertices in V(S); the latter points are determined as the algorithm progresses. In our 
version, we need to find both the sites and the Voronoi vertices as the plane sweep advances. And because 
not all sites are known in advance, we will need to verify tentative Voronoi vertex events as we sweep 
across them, at times backtracking our sweep line when our queries reveal new sites that invalidate tentative 
Voronoi vertices and introduce new events that are actually above our sweep line. We will show that each 
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query discovers a feature in the Voronoi diagram, that the number of times we backtrack is bounded by the 
number of these features, and these facts imply that the number of queries and updates we perform in a 
retroactive dictionary used to implement our sweep-line algorithm is 0(n). In fact, we will prove that the 
number of probes is at most An. 

We begin with an overview of our algorithm. The algorithm begins by finding all the Voronoi regions 
and edges that intersect the top edge of the bounding box, B. If there are k such regions (and thus k — 1 
edges), this can be accomplished in 0[n) time with 2k— 1 queries. This step initializes our event queue with 
k of the point sites in S. 

The algorithm then proceeds much as the Fortune algorithm, but with the following two important 
changes. Whenever we reach a point site event for some site q £ S (i.e., when q is removed from the event 
queue), we do a nearest-neighbor query on the point of the beach line directly above q — that is, the point 
with the same ^-coordinate as q and a y coordinate on the beach line that exists for the time value when 
the sweep line hits q. The position of this query point can be determined by using a retroactive dictionary 
queried with respect to the components of the beach line for the time value (in the plane sweep) associated 
with the point q. (See Fig. [T]) Querying this point will either confirm a Voronoi edge known to be part of 
the final Voronoi diagram (in which case we proceed with the sweep) or it will discover a new site r in S (in 
which case the sweep line restores point q to the event queue and backtracks to r). 

The second type of event is a tentative Voronoi vertex event. We do a nearest-neighbor query at the point 
believed to be a Voronoi vertex, which either confirms the vertex and all of its adjacent Voronoi edges above 
it, or it discovers a new site in S. This new site must be above the sweep line: at the time that Fortune's 
algorithm processes a Voronoi vertex event its sweep line must be as far from the Voronoi vertex as the three 
sites generating the vertex, so undiscovered sites below the sweep line cannot be nearest neighbors to the 
tentative vertex. If the algorithm discovers a new site above the sweep line, then again we backtrack and 
process that new site. In either case the Voronoi vertex is removed from the event queue — either added to 
the Voronoi diagram being constructed as a validated vertex, or ignored as a false vertex. We give the details 
in an appendix. 

Correctness and Complexity. Both the correctness and the analysis of this algorithm make use of the 
following important observation. Though the algorithm backtracks at certain "false" events — or tentative 
events that are proven false — it never completely removes any Voronoi components that have been confirmed 
by probes. Voronoi edges can only be added in two ways: the addition of a new site that creates one new 
edge, or the addition of a Voronoi vertex that terminates two edges and creates one new edge. In both 
cases, the edge is verified as an actual edge using a query before it is added to the Voronoi diagram being 
constructed, thus the diagram never contains edges that could later be falsified. (See Fig. [2]) 

The insertion of a new site begins a new edge directly above it, where the parabola of the site being 
added to the tree T — a degenerate line-segment parabola at the instant it is added — intersects the existing 
parabola above it, thus replacing one leaf in the tree with three. But before this site is inserted with its edge, 
the edge is tested with a query into the existing Voronoi diagram. The other time an edge may be added is 




Figure 1 : Illustrating the sweep-line algorithm for constructing a Voronoi diagram. 
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at a Voronoi vertex where two existing edges meet and a third new one is created. But all tentative Voronoi 
vertices are also verified by queries before they become circle events. 

Because of these verifying queries, the correctness of the algorithm follows from the correctness of the 
original Fortune algorithm and the fact that every tentative Voronoi vertex is verified by a query before being 
added to the diagram. Verifying every vertex in turn implies that every site in S is discovered. In particular, 
suppose we constructed a Voronoi diagram V(S) for some set S' that was missing at least one site from 5. 
(Since queries never return non-existing sites, S' cannot have any additional sites not in S.) Let p € S be any 
such site missing from S'. At least one Voronoi vertex in V(S') must be closer to p than to any site in S'. This 
follows, for example, from the fact that every Delaunay triangle has an empty circumcircle corresponding to 
a Voronoi vertex, so at least the Delaunay triangle containing p would have a corresponding Voronoi vertex 
closer to p. And so a query on that vertex would return p as the closest site. 

There are a few key observations that will lead to the analysis of the algorithm's run time and total 
number of queries. First, the sweep line will only backtrack when a new site in S is discovered, and so 
there are at most n backtracks. Second, every time we have a tentative Voronoi vertex that turns out to be 
unverified — that is, an event that turns out not to be part of the final diagram — we have also discovered a 
new site in S, and thus we have at most 0(n) phony events that are processed. It follows that the run time of 
the algorithm is asymptotically equivalent to the original Fortune algorithm, modulo the time needed for our 
retroactive data structure queries. Furthermore, after our initialization stage, every nearest-neighbor query 
either finds a site, verifies a site by looking at the Voronoi edge above it, or verifies a Voronoi vertex. The 
initialization, as noted, requires 2k — I queries if there are k sites initially discovered. Queries discover n — k 
more sites, verify n sites, and find at most 2n — 2 — k Voronoi vertices, for a total of 4n — 3 which is less than 
4n queries. 

The algorithm requires the same run time as the original Fortune plane sweep except for the processing 
of the tentative Voronoi vertices that prove to be phony, and all the backtracking (which is implemented 
using our retroactive dictionary). As noted, there are 0(n) of these backtracking steps, since these can only 
occur once for each previously undetected site in 5. So the overall number of updates and queries in our 
sweep-line- with-backtracking algorithm is 0(n); hence, the running time of our algorithm is 0(nlog 2 n). 
Thus, we have the following. 

Theorem 2: Given a set S of n points in R 2 , we can construct a copy ofV(S) using at most 4n Type-1 
queries and 0(nlog 2 n) time. 

3.1 An Alternate Algorithm Using More Time and Fewer Queries 

It is possible to further limit the number of queries at the cost of increased worst-case computational com- 
plexity. If n > 1, we can also exactly clone V(S) with at most 3n queries if we are willing to spend worst 
case 0(n 2 ) time implementing this alternative. 

This second algorithm follows an incremental construction paradigm, based on the general approach of 




overextended sweep line 



Figure 2: Backtracking the sweep-line. 
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Dobkin et al. [19] for discovering a 3 -dimensional convex polyhedron using finger probes. The alternative 
algorithm begins by querying each of the four corners of the bounding box. There are three cases to consider: 
these probes may discover one, two, or more than two sites in 5. If we discover more than two sites, then 
we construct the Voronoi diagram of all 3 or 4 of the sites discovered by these four queries, but we mark 
each Voronoi vertex as tentative and put it into a queue. The algorithm then proceeds as follows until the 
queue is empty. Remove a tentative Voronoi vertex from the queue, and query it. If the query reveals that 
it is a Voronoi vertex — that is, it has the three expected nearest neighbors — then we confirm the vertex and 
continue. If it is not a Voronoi vertex, then it must be closer to some previously undiscovered site in S, that 
will be returned by the query. We add that site to our list of known sites and update the Voronoi diagram in 
worst case 0(n) time using incremental insertion. When the queue is empty, we have a complete Voronoi 
diagram. Every probe except possibly one of the four corner probes discovered either a new site in S or 
confirmed a Voronoi vertex, and so the total number of queries is at most n + (2n — 5) + 1 < 3n. If the four 
corner queries discovered only two sites, then we compute the Voronoi edge that would be shared by these 
two sites if they were the only two sites in S, and we query both intersections of this edge with the bounding 
box. If we confirm both edges, then n = 2 and we are done. We have used 3n = 6 queries. If at least one of 
these two additional queries discovers another site, then we have at least three known sites and we proceed 
as with the previous case. Every query except at most three of the initial queries either confirmed a Voronoi 
vertex or discovered a new point site, and so the total number of queries is at most n + (2n — 5) + 3 < 3n. 
All four corners will belong to the same Voronoi region if and only if there is only one site in the bounding 
box, in which case 4 queries was sufficient. Thus, we have the following. 

Theorem 3: Given a set Sofn points in R 2 , we can construct a copy ofV(S) using at most 3n + 1 Type-1 
queries and &(n 2 ) time. 

Proof: We have already established the quadratic upper bound. For the sake of a lower bound, imagine that 
we have a set S' of n/2 points on the bottom boundary of B, all within distance 8 of the point (0,0), for 
a small parameter 8 with < 8 < 1/2". These points, by themselves, construct a Voronoi diagram with 
parallel edges. Suppose further that there is a single point, po = (8, 1 — 8), near the top boundary of B. The 
Voronoi region for po intersects the Voronoi region of every point in S'. The above algorithm therefore, after 
discovering the boundary points in S', would next query a vertex on the Voronoi diagram V(S') of 33, which 
will discover po. Next it will probe at vertex that is equi-distant to po and a point in S'. Suppose that this 
probe discovers a point p\ = (8, 1 /2). This point is closer to every point in 5" than po\ hence, updating the 
Voronoi diagram to go from V(S' U {po}) to V(S' U {po,Pi}) takes Q.(n) time. Now, suppose querying a 
tentative vertex of V(S' U {po,Pi}), which will be equi-distant from p\ and a point in 5', discovers a point 
P2 = (8, 1 /2 2 ). Again, updating the Voronoi diagram takes Q.(n) time. Suppose, therefore, that we continue 
in this way, with each newly-discovered point p\ = (8, 1/2') requiring that we spend Q.(n) time to update 
the current Voronoi diagram. After discovering p n / 2 _ i , the rc-th point in the set S = S' U {po, p\ , . . . , p n n- 1 } , 
we will have spent Q. (n 2 ) time in total to discover the Voronoi diagram V(S). ■ 

4 Distance Query Probes 

We next consider our Voronoi diagram cloning algorithm for the case when we use use only distance query 
probes — probes that return the distance to and label of the nearest site(s). Our method follows a similar 
approach to the backtracking method we used above. The main difference is that we require extra work to 
determine the exact location of the sites in S. 

We begin by describing how we can find those sites whose Voronoi regions (and thus also edges) inter- 
sect the top boundary of the bounding box where the sweep-line begins. We will speak of a probe circle as 
the set of possible locations of a site returned by a probe p: it is the circle of radius d centered at p, where d 
is the distance returned to the nearest site. 
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Initializing the Sweep Line. We begin the initialization process by probing at the two top corners of the 
bounding box. If both probes return the same site p, then by convexity of Voronoi regions the entire top 
edge of the box belongs to the Voronoi region V(p). Furthermore, both probes also return a distance d to 
the site p, and so p must fall on an intersection of two circles of radius d centered at the two corner probe 
locations. Since one of these intersections is above the bounding box, the remaining intersection gives the 
exact location of p. 

Assume that the two corner probes pi and p r on the left and right respectively return different sites qi 
and qR respectively. We know the distance pi to qi and pr to q^, but we don't know the exact locations of 
the two sites, nor do we know if there are any other sites with regions intersecting the bounding box. The 
segment between these two probes is therefore not fully classified. We will describe a recursive procedure 
for classification. 

Let L be an unclassified segment, with the probes p\ and p r on the left and right sides of the segments 
respectively returning different sites qi and q r . First, we probe the midpoint p m of L. The probe returns either 
one of the two known sites, or a new site q m . If it returns a new site, then we divide L into two segments that 
are both unclassified, but which have classified endpoints qi, q m , and q r , and we recursively classify them. 

Suppose query p m returns one of the already known sites. If this probe returns q r , then we can imme- 
diately compute the exact location of q r from the two probes p r and p m , since only one of the intersections 
of the probe circles is inside the bounding box. We also have classified the segment L r between probes p r 
and p m as being fully inside the Voronoi region of p r . The other half of the segment, L/, however, is not 
classified; we only know the Voronoi regions of its endpoints are the regions of q r and qi. Here it would be 
tempting to again probe the midpoint of the segment L\ ; however that could lead to an unbounded number 
of probes as we repeatedly divide the segment in half because the midpoint of the remaining unclassified 
segment L/ could still belong to q r and so we would gain no new information about qi. What we do instead 
is use the known location of q r , which must be outside the probe circle at pi, to find a probe location pn 
close enough to p\ that it is guaranteed not to gives us q r . This is possible since the perpendicular bisector 
between q r (which is a known point) and any point on the probe circle from pi — which is the set of candi- 
date locations for qi — must fall between p m and pi on a finite segment computable in 0(1) time, and thus 
anything between that range and pi is closer to qi than to q r . 

So this new probe pn returns either qi or a new site. If it returns a new site, then we divide the un- 
classified segment into two unclassified segments and recursively classify them. If this new probe gives 
us qi then we now know the exact location of qi from two probes. From the exact locations of q r and qi, 
we can compute and probe where their Voronoi edge ought to cross the bounding box, either confirming 
that Voronoi edge — which means that the entire edge is now classified — or we discover a new site. If the 
probe gives us a new site, then again we divide the unclassified segment into two unclassified segments and 
recursively classify them. 

Lemma 4: The initialization stage for the sweep line requires 0(k) time and 3k — I probes where k is the 
number of sites whose Voronoi regions intersect the top of the bounding box. 

Proof: Note that every probe either identifies a previously undiscovered site (k probes), provides a second 
probe with more information on an already discovered site enabling the exact location of this site to be 
computed (k probes), or confirms a Voronoi edge (k — 1 probes for k regions). So the total number of 
probes in this section is 3k — 1 where k is the number of sites whose Voronoi regions intersect the top of the 
bounding box. Each probe is processed in 0(1) time. ■ 

Processing the Sweep Line. The previous subsection explains how to initialize the sweep line. The al- 
gorithm making use of distance-only queries now proceeds as with the exact query probe version of the 
previous section, except that a slightly different approach requiring more probes will be needed to process 
tentative site events. 
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As with the algorithm of the previous section, there are two types of tentative events: a tentative Voronoi 
vertex for three known sites, and a tentative Voronoi edge that falls directly above a known site and is 
determined by one other known site. Both of these events need to be verified by probe-that is, we need to 
determine if these events are actually real, or whether there is some other site closer to the events. In both 
cases, we use a probe p where the tentative Voronoi feature should be. If that problem returns the correct 
three or two site labels (at the correct distance), then the verification is complete, and we proceed as with 
the algorithm of the previous section. 

However, these probes may discover a new site q; in this case, they give only the distance to that site 
and not its actually location. We need two more probes that return the same site in order to discover its 
exact location — but these probes may instead return yet other new sites. We now describe how to choose the 
locations of these probes so that no work is wasted, and each probe either verifies a Voronoi vertex, verifies 
a Voronoi edge above a known site, or is one of three probes that exactly locates a site. 

Let p\ be a probe during the sweep line, that attempts to verify a Voronoi vertex or edge, and instead 
discovers a new site q\ that was not previously known. Let d = d(p\,q\) be the distance returned from 
probe p\ to its site q\, and let e be the distance from p\ to the nearest previously known sites — that is, the 
two or three sites whose tentative Voronoi vertex or edge it was seeking to verify. Since probe p\ returned 
q\, we know that d < e. Let pi be any probe location such that d(pi,p2) < By the triangle inequality, 
we know d(p2,q\) < d + ^ = while d(p2,r) > e — = e -^-, where r is any of the two or three 
previously known closest sites to p\. It follows immediately that probe p2 cannot return any previously 
known site except q\ which was first discovered by probe p\. We can choose any probe location meeting 
this restriction, d(pi,p2) < which is computable in 0(1) time. 

So there are two possibilities with probe p%: either it returns site q\ again, or it returns a new site q2- 
If P2 returns q\, we now have two probes returning that site, and distances to that site, so its location is 
one of at most two intersections between the two probe circles. We can now probe either one of those two 
intersections, and from the result we determine the exact location of q\ because the probe either returns q\ 
at distance 0, or it returns some other site, or it returns q\ at a distance > 0. 

If p2 returns a new site q2, then we now have two sites that have been discovered, but whose exact 
locations are not known. We can discover the exact location using the recursive method of the previous 
subsection, treating the segment P1P2 as an unclassified segment, probing its midpoint, and continuing. 
However, once we have received a site as the result of two probes, we still require a third probe to exactly 
locate it since both intersections of the first two probe circles might be inside the bounding box. 

Lemma 5: Processing the remaining events (after the initialization) for the sweep line requires at most 
6n — 3k — 5 probes where k is the number of sites whose Voronoi regions intersect the top of the bounding 
box. 

Proof: There are n — k sites to be discovered, n sites that need to have an edge verified above them, and 
at most 2n — 5 Voronoi vertices in the Voronoi diagram of n sites. Every probe accomplishes one of five 
things: it verifies a Voronoi vertex (2n — 5 probes), verifies a Voronoi edge directly above a site (n probes), 
or is one of exactly three probes used to discover and then exactly locate a new site (3(n — k) probes.) The 
total number of probes required is therefore at most 6n — 3k — 5. ■ 

Correctness and Analysis. The correctness of the algorithm follows from the same argument as in the 
previous section, as long as we know that three probes suffice to determine the exactly location of a site. 
The analysis of the run-time is straightforward. 

Theorem 6: Given a set Sofn points in R 2 , we can construct a copy ofV(S) using at most 6n — 6 Type-2 
queries and 0(nlog 2 n) time. 

Proof: The bound on the number of probes follows directly from Lemmas[4]and[5] Since we can compute the 
exact location of each site with three probes instead of one, the run time of these algorithm is asymptotically 
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equivalent to that of the previous section. 



5 Label-Only Query Probes 

In this section we explore what information can be gained from the weakest of our queries (Type-3): given 
a query point p, the query returns only the label of the site q in S that is nearest to p, but no location or 
proximity information about site q. Using queries of the third type, it is impossible to exactly clone V (S) or 
even to determine with certainty the value of n = \S\. However even with this minimal query information, 
we construct an approximate Voronoi diagram V(S') which, without explicitly storing the locations of the 
sites in S, will still support later arbitrary approximate proximity queries to V(S). That is, for a new query 
point p' — not necessarily one of the query points used to construct the data structure — V(S') will be able to 
determine a site from S that is within e of the distance to the nearest neighbor of p', where £ can be made 
arbitrarily small. 

We describe first an approach that uses a common geometric search structure, the quadtree, and then 
show how we can improve upon these results and compute a Voronoi diagram. Neither approach is guaran- 
teed to find all the sites in S, but both will distinguish between any two sites separated by at least e. Let ./V 
be the number of sites found. The quadtree approach uses 0(N^) queries and time, while our approach of 
cloning an approximate Voronoi diagram requires 0(N log(^)) queries and 0(#(logiV + log(±))) time. So 
the Voronoi diagram cloning approaches reduce the multiplicative factor from ^ to log(^). 

First, however, we show the impossibility of constructing an exact clone. 

5.1 Privacy Under Label- Only Queries 

The main result of this subsection is the following lemma, the proof of which is straightforward but is 
included here for completeness. 

Lemma 7: Given a set S of n planar points in a bounding box of area greater than 0, using any finite 
number of queries p that return only the unique label q of the closest site (or sites ) in S to p, it is impossible 
to determine with certainty the value ofn = \S\. 

proof: Since the set of query points is finite, there must be non-empty areas of the plane (or bounding box) 
containing no query points. Let d be any such disc containing no query point. Suppose the set S has a site 
qo in the center of d, and sites qi,q2, • • • ,<?6 on the boundary of d at intervals of 7T/3. Any query outside of 
d — which by assumption includes all the query points — will be closer to one of q\ , q2, ■ ■ ■ ,qe than to go and 
thus no query point will discover qo- 

In fact by spacing 7 sites around the disc d, we can "hide" an arbitrary number of sites inside of d that 
will not be discovered by any query. This leads to the following even stronger lemma. 

Lemma 8: Given a set Sofn planar points in a bounding box of area greater than 0, using any finite number 
of queries p that return only the unique label q of the closest site (or sites) in S to p, letN be the number of 
distinct sites returned by the queries. It is possible thatN/n < e for any positive constant e. 

That is, the number of sites in S discovered by queries — and included in any cloned Voronoi diagram — 
no matter how many queries are used, could be any arbitrary small fraction of the actually number of points 
in 5. Phrased the other way, we could miss any arbitrary large percentage of the sites in S. The following 
corollary also follows directly from this. 

Corollary 9: Given a set Sofn planar points in a bounding box of area greater than 0, it is impossible to 
exactly compute (or clone) V(S) using any finite number of queries p that return only the unique label q of 
the closest site (or sites) in S to p. 
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5.2 A Quadtree Approximation Approach 

Before looking at how to explicitly construct an approximate Voronoi diagram from queries of the third sort, 
we note that we could also use queries to construct a standard quadtree that would support £-approximate 
nearest neighbor searches, and could be used to construct an £-approximate Voronoi diagram. 

The algorithm for constructing a quadtree using queries on the set of unknown sites is very similar to 
the standard algorithm for constructing a quadtree for a known set of sites. We begin with a bounding box, 
and query each of its corners, labeling them with the nearest site from S returned by our query. If all four 
corners are labeled with the same nearest site q G S, then by convexity of Voronoi regions the entire box is 
inside the Voronoi region for q — or, alternately, q is the nearest neighbor in S to every point in the box. 

If the corners of the bounding box have more than one label, we recursively subdivide it by making five 
new queries: one on the center of each side, and one in the center of the rectangle. The box now has 9 labels 
from queries, which can be used as the 4 corners of 4 separate boxes. The original box has a pointer to 
each of its four child boxes: upper left, upper right, lower left and lower right. Any of these smaller boxes 
that has corners with different labels gets recursively subdivided again into four children. This process will 
continue indefinitely. We stop when any bounding box has a diagonal of length less than a fixed constant e. 

Unfortunately, the run-time, depth of the quadtree, and number of queries required, are all inversely 
related to e rather than to the number of points in S, and the algorithm can never determine with certainly 
the size n of the set S. The number of probes and the time to construct the tree is 0(N^), where N <nis the 
number of sites discovered by the algorithm. The depth of the quadtree we construct is 0(\og(^)). 

5.3 A Voronoi Diagram Approach 

We now show that an approximate Voronoi diagram can be constructed to answer nearest-neighbor queries, 
with a probing process that uses 0(iVlog(l/e)) queries and 0(A r (logA^ + log(l/£))) time, where N <n 
is the number of discovered sites in S. (Any two sites separated by at least £ will be distinguished and 
discovered.) The main idea of the algorithm is to build an approximation to the Voronoi cell of each known 
site, using 0(log(l/e)) queries per feature of the cell. This sequence of queries either finds a sufficiently 
accurate approximation for the location of that feature or discovers the existence of another site label. We 
begin by querying each corner of our bounding box to find the label of the site in whose region that corner 
belongs. For any side of the box whose corners are in different Voronoi regions, we do a binary search to 
find, within a distance of e 2 , the edge of the Voronoi region for each different site. This may discover new 
Voronoi regions. For each new region discovered, we also do a binary search to discover its edges to within 
e 2 . Each binary search requires 0(log ^) queries and time. The result is an ordered list of Voronoi edges 
crossing each side of the bounding box. 

A second similar search a distance of 2s from each side of the bounding box will find the same Voronoi 
edges — or will discover a new Voronoi region, indicating that the Voronoi edge has ended. For those Voronoi 
edges that have not ended within 2e, we compute an approximation of the line containing the Voronoi 
edge — that is, the perpendicular bisector of the two sites whose labels we know. An argument using similar 
triangles shows that our approximation of this edge is accurate enough that we can determine to within a 
distance of < £ where this edge crosses the far boundary. We do a doubling search of queries out along 
each discovered Voronoi edge, and then a binary search back once we have moved past the end of the edge, 
to find where it ends. (For those edges ending within 2s of the edge of the box, we use a constant number 
of queries in a circle of radius £ around the edge to find where it ends, or to determine that the edge is too 
short to be included in our approximation.) Thus, three binary searches of 0(log(^)) queries and time each 
suffice to discover complete approximations of each Voronoi edge intersecting the bounding box, including 
an approximate location of the Voronoi vertex terminating these edges. In the worst case, our approximation 
is within £. A constant number of queries in the vicinity of each Voronoi vertex will discover the other edge 
or edges coming out of the Vertex. We repeat this process for each new Voronoi edge as it is discovered, until 
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every Voronoi edge has both ends terminated at Voronoi vertices, at which time the approximate Voronoi 
diagram is complete. 

Theorem 10: Given a set Sofn points in R 2 , we can construct a planar subdivision, V', using 0(N log ^) 
Type-3 queries and 0(N (log N + log ^)) time, where N < n is the number of discovered sites in S, such that 
any two sites separated by at least e will be distinguished and discovered and each point on the 1 -dimensional 
skeleton ofV is within distance e of a point on the 1-dimensional skeleton of the Voronoi diagram, V(S), of 
S. 

6 Conclusion and Open Problem 

We have given a number of efficient algorithms for cloning Voronoi diagrams, under various assumptions 
about the types of queries that are allowed. Our methods improve those that would be implied by using 
existing polytope probing strategies, with several being based on the use of a novel retroactive dictionary 
implementation. We leave as an open problem whether it is possible to efficiently clone a Voronoi diagram 
given only approximate distance queries. 
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A Algorithmic Details for Method with Type-1 Queries 

In this appendix we provide more detail to our algorithm to exactly clone a Voronoi diagram using Type- 1 
queries, following the de Berg et al. [15] presentation of Fortune's sweep-line algorithm ll22l . which is based 
on using a beach line made up of parabolic arcs instead of hyperbolic arcs (readers are referred to de Berg 
et al. for the complete details). 

The data structures consist of the Voronoi diagram V(S) itself that begins empty and is constructed 
during the sweep, a priority queue Q of events (allowing for deletion of arbitrary events), and a retroactive 
dictionary, T, to keep track of the active sites in the beach line at each time of the plane sweep. For each 
time value (which is determined by a ^-coordinate of a point event or Voronoi vertex event), T is searchable 
by ^-coordinate at any time for which the beachline exists. Our algorithm makes queries into a black box 
that has complete information about S and represents the actions of the data owner, Alice. The algorithm 
follows: 

• Initialization: Using as initial queries the top left and top right corners of the bounding box, pi and 
p r respectively, determine the nearest neighbors in S to p\ and p r . Let the results of these queries be 
sites qi and q r respectively. If qi = q r — that is, both corners pi and p r have the same nearest neighbor 
q = qi = q r G 5 — then only one Voronoi region V (q) (and no Voronoi edges) intersects the top of the 
bounding box. Add site q to the event queue Q, terminate the Initialization stage, and go to the Plane 
Sweep stage of the algorithm. 

Initialization Recursive Step: If qi ^ q r — that is the two corners are in different Voronoi regions — 
then we recursively subdivide as follows. Put sites qi and q r in Q. Then compute (in 0(1) time) the 
perpendicular bisector E of the segment qiq r on which lies the Voronoi edge between regions V(qi) 
and V(q r ), if these regions share an edge. Let p m be the intersection of E with the top of the bounding 
box. (This intersection must exist, since the two corners are in different Voronoi regions.) Query p m . 
If qi and q r are the nearest neighbors of p m , then we are done. If not, let q m (with q m ^ qi and q m ^ q r ) 
be the nearest site in S to p m . Add q m to Q, and make two recursive calls to this step, one with pi and 
p m taking the roles of pi and p r above, and one with p m and p r taking the roles of p\ and p r . 
At the end of the initialization, every site whose Voronoi region intersects the top of the bounding box 
will be in the event queue. We now carry out the following plane sweep using the event queue. 

• Plane Sweep 

while the event queue Q is not empty do 

Remove the event with the largest y-coordinate. 

It is either a SiteEvent /?, or a Tentative VoronoiVertex 

if the event is a site event then 

HandleSiteEvent(/?,) 
else 

HandleTentativeVoronoiVertex(<7,-) 
The difference between this and the standard Fortune plane sweep is how the events are handled, as 
described below. 

• HandleSiteEvent(^ ; ): 

Search in T for the site pj whose region contains the jc-coordinate of /?, when the sweep-line is at 
the j-coordinate of p,. (Since pj is already in T, it must have a ^-coordinate larger than that of 
Pi.) 

To verify that we are ready to process p,- — that its tentative Voronoi edge directly above it is an 
actual Voronoi edge, and we did not miss any site between p, and p ; — compute the point q on the 
bisector of pipj that has the same x-coordinate as This would be a point on the Voronoi edge 
between pi and pj if that edge exists. 
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Query point q to find its nearest neighbor(s) in 5. 
if the two nearest neighbors of q are p\ and pj then 

Insert /?, into the retroactive dictionary, T. 

Add the verified edge ptpj to V(S) 

Complete processing this as a site event following the details of [15 ]. This includes c construct- 
ing tentative Voronoi vertices from the edge pipj and its neighbors, and adding them to Q if 
not already present. 
Delete p t from Q. 
else 

Leave p; in Q 

Let pk be the nearest neighbor of q. 

Add pk to Q. We then continue processing. Note that if p^ has a higher j-coordinate than /?,, 
then we have to backtrack our sweep line. 
We next describe how to handle a tentative Voronoi vertex (see Fig. [3]>. 




*A 

Figure 3: A Voronoi vertex event (A) that discovers a new site (B). 

• HandleTentativeVoronoiVertex(<7;) : 

Query point qi to determine if its three actual nearest neighbors are the three sites whose Voronoi 

regions generated this tentative Voronoi vertex. 

if the qi has the expected three nearest neighbors then 

Perform a deletion in T for beachline segment that ends at this vertex at this time in the sweep 

Add to V(S) the Voronoi vertex qi and the edges above it. 

Delete qi from Q. 

Complete processing this as a CircleEvent following the details of lfl5l . 
else 

Delete qi from Q. 

Add to Q all of the sites that are nearest neighbors of query point qi and continue. 
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